Diverse Context for Learning Word Representations
نویسنده
چکیده
Word representations are mathematical objects that capture a word’s meaning and its grammatical properties in a way that can be read and understood by computers. Word representations map words into equivalence classes such that words that share similar properties to each other are part of the same equivalence class. Word representations are either constructed manually by humans (in the form of word lexicons, dictionaries etc.) or obtained automatically using unsupervised learning algorithms. Since, manual construction of word representations is unscalable, and expensive, obtaining them automatically is desirable. Traditionally, automatic learning of word representations has relied on the distributional hypothesis, which states that the meaning of a word is evidenced by the words that occur in its context (Harris, 1954). Thus, existing word representation learning algorithms like latent semantic analysis (Deerwester et al., 1990; Landauer and Dumais, 1997), derive word meaning in terms of aggregated co-occurrence counts of words extracted from unlabeled monolingual corpora. In this thesis, we diversify the notion of context to include information beyond the monolingual distributional context. We show that information about word meaning is present in other contexts like neighboring words in a semantic lexicon, context of the word across different languages, and the morphological structure of the word. We show that in addition to monolingual distributional context these sources provide complementary information about word meaning, which can substantially improve the quality of word representations. We present methods to augment existing models of word representations to incorporate these knowledge sources.
منابع مشابه
The Effect of Three Vocabulary Learning Strategies of Word-part, Word-card and Context-clue on Iranian High School Students’ Immediate and Delayed English Vocabulary Learning and Retention
The present study was an attempt to compare the effect of three VLSs, namely word-part strategy, word-card strategy and context-clue strategy on immediate and delayed English vocabulary retention of Iranian third grade high school students. To this end, 90 students, studying at three high schools in Tabriz, in three intact groups, were considered as the participants of the study. In order to en...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملWord Type Effects on L2 Word Retrieval and Learning: Homonym versus Synonym Vocabulary Instruction
The purpose of this study was twofold: (a) to assess the retention of two word types (synonyms and homonyms) in the short term memory, and (b) to investigate the effect of these word types on word learning by asking learners to learn their Persian meanings. A total of 73 Iranian language learners studying English translation participated in the study. For the first purpose, 36 freshmen from an ...
متن کاملLearning Syntactic Categories Using Paradigmatic Representations of Word Context
We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We compare a bigram based baseline model with several paradigmatic models and demonstrate significan...
متن کاملThe role of discourse context in developing word form representations: a paradoxical relation between reading and learning.
To acquire representations of printed words, children must attend to the written form of a word and link this form with the word's pronunciation. When words are read in context, they may be read with less attention to these features, and this can lead to poorer word form retention. Two experiments with young children (ages 5-8 years) confirmed this hypothesis. In our experiments, children attem...
متن کامل